A new duration modeling approach for Mandarin speech
نویسندگان
چکیده
In this paper, a new duration modeling approach for Mandarin speech is proposed. It explicitly takes several major affecting factors as multiplicative companding factors (CFs) and estimates all model parameters by an EM algorithm. Besides, the three basic Tone 3 patterns (i.e., full tone, half tone and sandhi tone) are also properly considered via using three different CFs to separate their affections on syllable duration. Experimental results showed that the variance of the syllable duration was greatly reduced from 180.17 to 2.52 frame2 (1 frame =5 ms) by the syllable duration modeling to eliminate effects from those affecting factors. Moreover, the estimated CFs of those affecting factors agreed well to our prior linguistic knowledge. Two extensions of the duration modeling method are also performed. One is the use of the same technique to model initial and final durations. The other is to replace the multiplicative model with an additive one. Lastly, a preliminary study of applying the proposed model to predict syllable duration for TTS is also performed. Experimental results showed that it outperformed the conventional regressive prediction method.
منابع مشابه
A novel syllable duration modeling approach for Mandarin speech
In this paper, a novel syllable duration modeling approach for Mandarin speech is proposed. It explicitly takes several main affecting factors as multiplicative companding parameters and estimates all model parameters by an EM algorithm. Experimental results showed that the variance of the observed syllable duration was greatly reduced from 183.4 frame (1 frame = 5 ms) to 18.5 frame by eliminat...
متن کاملDuration modeling and memory optimization in a Mandarin TTS system
Current speech synthesis efforts, both in research and in applications, are dominated by methods based on concatenation of spoken units. New progress in the concatenative text-to-speech (TTS) technology can be made mainly from two directions, either by reducing the memory footprint to integrate the system into embedded system, or by improving the synthesized speech quality in terms of intelligi...
متن کاملModeling Duration and Tonal Coarticulation in a Mandarin Chinesese Synthesis
We present in this paper the results of a duration study and a tonal coarticulation study designed for the concatenative Mandarin Chinese synthesis system developed at the Dresden University of Technology. It is reported that the duration model and the tonal coarticulation model are the two most important components of the prosody control in Mandarin. The material for the study of the two proso...
متن کاملModeling Duration and Intonation in Mandarin Chinese Synthesis with a Neural Network
The prosody control plays an important role in the naturalness of synthesized speech. In previous work, great efforts have been made to generate rule-based or parameter-based prosodic models [6]. In order to capture the complex interaction of different relevant prosodic factors, neural networks were recently employed. This paper presents a new method of learning and modeling duration and intona...
متن کاملImproved generation of prosodic features in HMM-based Mandarin speech synthesis
The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However, the prosodic features, like F0 and duration trajectories, generated by HMM-based speech synthesis are often excessively smoothed and lack prosodic variance. In HMM-based TTS durations are typically modeled statistically using state duration probabili...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Speech and Audio Processing
دوره 11 شماره
صفحات -
تاریخ انتشار 2003